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Final  Report 


In  this  year's  effort,  research  focussed  on  four  areas. 

1 .  Adaptive  Filtering  and  Estimation. 

Work  on  the  theory  of  running  FFT's  continued  with  emphasis  on  problems 
in  adaptive  filtering.  Results  were  presented  in  the  following  paper; 

"Adaptive  Frequency  Domain  Estimators" 

IEEE  International  Symposium  on  Information  Theory,  Grignano,  Italy,  1979. 


The  method  was  applied  to  the  problem  of  detecting  a  moving  target  in 
the  presence  of  strong  clutter.  The  filtering  was  based  not  on  global  but 
on  local  spectral  properties  of  the  clutter  determined  adaptively  with  thres¬ 
hold  and  other  techniques.  Results  were  presented  in  the  following  paper: 

"Adaptive  Clutter  Suppression" 

Seventh  DARPA  Strategic  Space  Symposium,  Naval  Post  Graduate  School, 
Monterey,  California,  1980. 
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29.  ADAPTIVE  CLUTTER  SUPPRESSION* 

X 

By:  A.  Papoulls  K.  Huang  and  Ch.  Chamzas' 


Abstract — A  method  of  target  detection  is  presented  based  on  the 
determination  of  the  local  spectral  properties  of  the  background  inter¬ 
ference.  In  this  method,  the  running  FFT  of  the  detector  output  is 
evaluated  recursively  and  the  target  is  detected  with  the  use  of  a 
threshold  technique  that  separates  the  significant  components  of  the 
local  target  and  clutter  spectra.  In  the  Illustrations,  the  motion  of 
the  target  is  used  to  generate  a  high  frequency  response  at  the  output 
of  each  detector  element  etched  with  a  mask  that  matches  the  point 
spread  of  the  optical  system. 


L.  Introduction 

We  consider  the  problem  of  detecting  a  target  in  the  presence  of  strong 
Interference.  Unlike  the  usual  methods  the  proposed  approach  Is  based 
on  the  design  of  a  filter  whose  parameters  are  not  specified  in  advance 
in  terms  of  global  statistics  but  are  adaptively  controlled  in  terms  of 
local  spectra  evaluated  in  real  time. 

The  problem  is  essentially  multi-dimensional  (space-time).  However,  for 
notatlonal  simplicity,  we  discuss  only  its  one-dimensional  form  (time). 
The  results  can,  in  principle,  be  extended  to  several  variables. 

The  one-dimensional  problem  in  its  post-detection  form  involves  the 
estimation  of  a  signal  s(t) ,  or,  at  least,  the  determination  of  the 
presence  of  such  a  signal,  in  terms  of  the  detector  output 

x(t)  -  c(t) +s(t) +v(t)  (1) 

where  c(t)  is  the  detector  output  due  to  clutter,  and  v(t)  is  background 
noise.  The  processing  is  carried  out  digitally  in  terms  of  the  samples 

x(n]  *  x(nT) 

of  x(t).  Thus,  the  signal  processing  problem  is  the  detection  of  the 
component  s[n}  of  the  sum 


x[n] *  c(n]  +  s [n] +  v[n] 


(2) 
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The  factors  affecting  the  selection  of  the  sampling  Interval  T  will  not  be 
considered. 

The  most  common  form  of  target  detection  uses  a  FIR  filter  whose  output  Is 
the  weighted  sum 

N-1 

y[nl  “  1  a.  xtn-k]  (3) 

k-0 

The  coefficients  of  this  filter  are  Independent  of  n  and  are  chosen  so  as 
to  yield  a  suitable  frequency  response 

k-0  ^ 

A  special  case  Is  the  mth  difference  filter  obtained  with  a,  =  (j^) .  The 
resulting  system  function  Is  given  by 

H(z)  =  <l-z‘^)“ 


and  can  be  realized  as  a  cascade  of  first  order  systems.  This  filter  Is 
chosen  primarily  because  it  is  simple  (it  requires  no  multiplication). 
It's  frequency  response  is  a  rather  primitive  high-pass  curve 


In  the  target  detection  problem  it  is  desirable  to  adapt  the  system  char¬ 
acteristics  to  the  local  properties  of  the  background.  This  requires  the 
design  of  a  time-varying  filter: 


N-1 

”  I  a,.[n]x[n-k]  (4) 

k-0  ^ 


with  adaptively  controlled  coefficients  a^[n].  The  adaptation  algorithms 
involve  various  numerical  schemes  for  determining  local  statistics  but  are, 
in  general,  complex.  A  simple  design,  that  can  be  used  if  the  signal  s[n] 
to  be  estimated  is  somehow  available  (as  a  pilot,  for  example,  or  as  de¬ 
layed  observation),  is  the  Wldrow  filter: 

*k^“^  “aktn-1]  +w{s(nl-y[n]|  x  [n-k]  (5) 

where  u  is  a  suitable  constant. 
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In  the  above  filters,  the  processing  Is  performed  In  the  time  domain. 

This  Is  not  optimum  for  the  problem  under  consideration  because  the 
separation  between  target  and  clutter  depends  on  frequency  domain  proper¬ 
ties.  riR  filters  as  in  (3)  or  (4)  ^an,  of  course,  separate  frequency 
components  but  this  requires  proper  adjustment  of  all  their  coefficients 
a^.  The  proposed  processing  involves  processing  directly  in  the  frequency 
domain.  As  we  shall  see,  the  elimination  of  various  frequency  components 
is  accomplished  simply  by  eliminating  the  corresponding  coefficients. 

This  reduces  drastically  the  number  of  the  adaptively  controlled  parameters. 


Running  Spectra 


The  running  FFT  of  a  signal  x[n]  is  by  definition  the  sum 


M  .  2ir/N 

X  [n]  *  y  x[n-k]w''°  w“e^  N»2M+1  (6) 

“  k--M 


Thus,  X^[n]  is  the  mth  FFT  coefficient  of  N  consecutive  samples  of  xtn] 
centered  at  n.  The  proposed  adaptive  frequency  domain  filter  is  a  time- 
varying  system  whose  output  is  the  sum 

M 

zin]  -  I  b„[n]  X„rn]  (7) 

tnF-M 

where  the  weights  b^Cn]  are  adaptively  controlled  in  a  variety  of  ways  de¬ 
pending  on  the  applications.  For  example,  if  the  Widrow  algorithm  is  used, 
then  b|Q[n]  is  determined  as  a  first  order  recursion  as  in  (3) : 

b^[n]  -  bj|j[n-l] +p|s[n]  -  z[n]  ]  Xj^[n]  (8) 


It  might  appear  that  (7)  is  equivalent  to  (4) ,  obtained  merely  by  a  linear 
transformation  of  the  data.  This,  however,  is  not  so.  If  it  is  concluded, 
either  from  prior  information  or  from  recent  observations,  that  the  fre¬ 
quency  components  of  the  interference  are  concentrated  in  certain  frequency 
bands,  the  corresponding  terms  in  (7)  can  be  eliminated  .  This  leads  to 
the  response 

M2[n] 

ztn]»2^  y  b  [n]  X  [n]  (9) 

m-M^[n]  “ 

where  not  only  the  coefficients  b^(n]  but  also  the  cut-off  frequencies 
Mj^[n]  and  M2[n]  are  adaptively  controlled. 
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In  the  last  section,  we  estimate  the  presence  of  s[n]  In  terms  of  the  sum 


z[n] 


2 

N 


M^En] 


I 

m“M, 


Re  X  [n] 
In]—  “ 


(10) 


This  is  a  special  case  of  (9)  obtained  with  bj|j”l/N  and  it  equals  s[n]  if 
the  frequencies  M2  separate  completely  the  spectra  of  the  signal  and 
the  interference.  The  determination  of  and  M2  is  accomplished  with  a 

threshold  method  that  is  based  on  the  determination  of  local  clutter 
averages . 

The  advantages  of  the  proposed  filter  are  obvious:  Processing  in  the 
frequency  domain  based  not  on  global  prior  statistics  but  on  local 
averages.  However,  it  appears  that,  in  contrast  to  time-domain  filtering, 
the  required  number  of  arithmetic  operations  is  large:  N  multiplications 
are  required  to  determine  Xg|[n]  for  each  m  and  n.  We  shall  presently 
show  that  this  is  not  so.  Each  FFT  X^tn]  can  be  determined  recursively 
with  only  one  multiplication.  Indeed,  from  (6)  it  follows  that 

Xj^[n]  -  w“Xj|^rn-l]  »  x[n4M]w'*^  -  x[n-M-l]w^  (11) 

that  is  X^fn]  can  be  obtained  as  the  output  of  a  simple  first  order  filter. 
To  realize  (11)  in  real  time,  we  must  of  course  introduce  a  delay  of  M 
units. 

3.  The  Gemini  Concept 

Freqtiency  domain  filtering  can  be  used  in  most  methods  of  target  detection 
because  the  suppression  of  the  Interference  is  based  on  the  assumption 
that  the  clutter  component  c[n]  of  the  detector  output  x[n]  varies  slowly 
relative  to  the  target  component  s[n].  However,  to  be  concrete,  we  shall 
consider  a  special  case  based  on  the  Gemini  principle  (Flg.l): 

Each  detector  element  is  covered  with  a  mask  consisting  of  vertical  strips 
with  transparency  m(x)  that  is  somehow  matched  to  the  point  spread 

h(x,  y  )  -  h(/ x^+  y^) 

of  the  optical  system  and  its  output  x(t)  equals  the  integral  of  the  light 

intensity  across  its  surface.  For  simplicity,  we  assume  that  the  center 

of  the  detector  is  at  the  origin  (x®0,  y  •»  0)  and  that  the  target  is  a 

point  source.  The  results  can  be  readily  generalized  to  arbitrary  moving 

targets.  Denoting  by  v  ,  v  the  velocity  components  of  the  target  properly 

scaled,  we  conclude  that  ^its  image  is  h(x-v  t,  y-v  t) .  Hence,  the  de- 

X  y 
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6, 


teccor  output  Is  given  by  the  integral 

s(t)  ■  II  h(x-Vyt,  y-Vyt)  m (x) dxdy 
D 

where  D  is  the  region  of  the  detector  element.  With 


(12) 


P<x) 


OD 

I  h(x. 


y  )dy 


(13) 


the  line-spread  of  the  system,  we  obtain  from  (12) ,  neglecting  end-effects 


s  (t) 


p(x-v  t)  m  (x)dx  =  ((>  (v  t) 
y 


(14) 


where 


♦  (x) 


■I 


p(x-0  m  (C)dC 


(15) 


Denoting  by  H(u,v)  the  MIF  of  the  system  and  by  P(u),  $(0),  and  M(u) ,  the 
Fourier  transforms  of  p(x),  <{i  (x) ,  and  m(x)  respectively,  we  obtain 


P(u)«H(u,  0)  ,  *(0) -P(u)  M(u)  (16) 

The  spectrum  S (u)  of  the  detector  output  s (t)  is  thus  given  by 

S  (“)  -T^l  «  (f-)  =r;^  H  (f-  ,  0)  M  if-)  (17) 

(  X  I  X  I  X  I  X  X 

This  shows  that  a  high  velocity  component  v  In  the  x-direction  generates 
high  frequencies  in  the  component  s(t)  of  x(t)  due  to  the  target.  Hence, 
c(t)  can  be  removed  with  frequency  domain  processing.  The  y-component  of 
the  velocity  has  no  effect  on  the  spectrum  of  x(t). 

In  the  next  section,  we  Illustrate  the  above  with  a  numerical  example  in¬ 
volving  a  one-dimensional  mask  as  in  Fig.  1.  It  might,  however,  be  of 
Interest  to  comment  briefly  on  the  possibility  of  detecting  targets  moving 
in  any  direction.  As  we  show  next,  this  can  be  done  with  masks  consisting 
of  circles: 
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m(x,y)“m(r)  (18) 

that  are  matched  somehow  to  the  point-spread  h(r) 

FOCAL  PLANE 


FIGURE  1  FOCAL  PLANE  IMAGE  OF  A  MOVING  TARGET. 


We  change  the  coordinates  to  (5 ,  n )  where  ^  is  in  the  direction  of  motion 
of  the  target.  With  v  its  velocity  and  rig  the  distance  from  the  origin  to 
the  line  of  motion,  the  image  at  time  t  is  h(5-vt,  n-ng)  and  the  detector 
output  is  the  integral 


Thus,  the  detector  output  x(t)  is  the  profile  iKE,ri  )  of  iKr)  on  the 
plane  <1  ■  properly  scaled.  This  curve  is  shown  in  Fig.  2  as  a  function 
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FIGURE  2  (a)  CIRCULAR  MASK 

(b)  DETECTOR  OUTPUT  4>(C,n)  DUE  TO  A  MOVING  POINT  SOURCE. 


of  5  for  various  values  of  rig.  The  point  spread  used  is  the  Airy  pattern 

(r) 

h(r)  * — 2 - 

T 

and  the  mask  m(r)  is  a  succession  of  transparent  and  opaque  rings  with 
boundaries  at  the  zeros  of  Jj^(r)  . 


4.  Numerical  results 

In  this  section,  we  Illustrate  the  adaptive  frequency  domain  method  with 
an  example  Involving  the  detection  of  a  moving  target  in  the  presence  of 
strong  Interference.  The  data  are  computer  generated. 

The  samples  of  the  detector  output  form  a  discrete  signal  x[n]  as  in  (2). 
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(a) 


(c) 


0  „  1000 


(e) 


FIGURE  3  (a)  TARGET  s[n]  AND  SIZE  OF  THE  RUNNING  FFT. 

(b)  mini  :  ETCHED  MASK,  (c)  CLUTTER  c[n]  AND 
BACKGROUND  NOISE  v[n].  (d)  DETECTOR'S  OUTPUT  x[n]. 

(e)  ENERGY  OF  s[n]  AND  c[a]+v[n],  AVERAGED  OVER 
THE  FFT  SIZE. 


(Fig. 3)  and  our  objective  is  to  detect  its  presence.  The  numerical  pro¬ 
cessing  follows: 

We  form  the  running  FFT  Xm[n]  of  x[n]  of  order 


N-  101 


using  the  first  order  recursion  (11)  and  form  the  sum  z[n]  as  in  (10). 
The  cut-off  frequencies  Mj^[n]  and  M2[n]  are  determined  adaptively.  The 
upper  cut-off  point  depends  on  the  noise  component  v[n].  For  simplicity 
we  choose  a  fixed  value  M2[n]’=45,  limiting  the  discussion  to  the  choice 
of  Mj^[n].  For  this  purpose,  we  form  the  intermediate  average  (fig.  4) 


m 

X^[n-kl  o»0.99 


(22) 


of  Xjj[n]  and  we  choose  for  Mj^[nl  the  smallest  value  of  such  that 
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X„tn] 

m 


<L  for  M^[n]  -  s  <  M^[n] 


(23) 


where  L  is  a  threshold  level 


In  figure  5a  we  show  |x  [n]l  as  a  function  of  m  and  n,  with  jx  [n] | 

I  tn  I  ‘  Tti 

truncated  to  the  threshold  level  L.  In  figure  5b  we  show]x^[n^]|  for 
n^  “  700  and  in  figure  5c  we  plot  the  values  of  the  lower  cut-off  point 
M^[n]  as  a  function  of  n. 

The  resulting  sum 


zin]  I  Re  X  fn] 


is  due  primarily  to  the  target  but  it  contains  a  component  e[n]  (error) 
due  to  the  frequency  components  of  c[n]  and  v[n]  in  the  band  (M^,  M2). 

We  next  form  the  short  term  and  long  term  averages  (Fig. 6) 


y[n]  a  I  y(n-k]  a  "■ 
k=0  ^ 


=  0.9 


y[n] =  I  y[n-k]  a 


k=0 


02  =  0.999 


of  the  energy  y[n]  =z‘^[n] 

These  sums  are  determined  recursively: 


y[n]  =  y[n-l]  +  y  [n]  y[n]  =02  y[n-l]  +  y[nl 


Since  the  target  is  of  short  duration,  the  long-term  average  y[n]  is  due 
namely  to  the  clutter.  If 


y[n]  >  k  y[n]  k=3 

then  the  target  is  present.  (Fig. 7) 
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FIGURE  5  (a)  |x  [n]|:  INTERMEDIATE  AVERAGE  OF  THE  FREQUENCY 

“  COMPONENTS,  (n  is  in  steps  of  20). 

(b)  lx^[n]|  FOR  n-ng 

(c)  Mj^tn]:  LOWER  CUT  OFF  POINT  OF  THE  RUNNING  FILTER 
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Short-term  average 


FIGURE  6  GENERATION  OF  THE  SHORT-TERM,  ytn],  AND  LONG-TERM,  y[n], 
AVERAGES. 


LVfc:’ 


f«iT  ir 


(a) 


n 

(b) 


FIGURE  7  (a)  z[n]:  OUTPUT  OF  THE  ADAPTIVE  FILTER 

(b)  COMPARISON  OF  yTnT  and  3y[n] 
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2.  Bandllmited  Extrapolation. 

The  investigation  of  the  problems  of  extrapolating  bandlimited  signals 
by  iteration  was  completed.  The  method  was  applied  to  problems  in  Image 
Enhancement,  Spectral  Estimation,  Deconvolution,  and  Detection  of  Hidden 
Periodicities  among  others.  The  latest  results  are  shown  below: 


"Windows  and  Extrapolation" 

IEEE  Workshop  on  Spectral  Estimation,  Cyprus  Gardens,  Florida,  1980. 


"Detection  of  Hidden  Periodicities  by  Adaptive 
IEEE  Tr-ASSP-27,  No.  5,  October  1979  pp. 


Extrapolation" 

492-500. 


Detection  of  Hidden  Periodicities  by 
Adaptive  Extrapolation 

ATHANASIOS  PAPOULIS,  fellow,  ieee,  and  CHRISTODOULOS  CHAMZAS 


Abitract-A  method  is  piesented  for  determining  the  harmonic  com¬ 
ponents  of  a  noisy  signal  by  nonlinear  exlrapolition  beyond  the  data 
interval.  The  method  is  based  on  an  algorithm  that  adaptively  reduces 
the  spectral  components  due  to  noise, 

1,  Introduction 

An  important  problem  in  many  applications  is  the  deter¬ 
mination  of  the  frequency  components  of  a  signal 

Manuscript  received  November  22,  1978;  revised  January  25,  1979 
and  March  20,  1979.  This  work  was  supported  by  the  Advanced  Re¬ 
search  Projects  Agency  of  the  Department  of  Defense  and  was  moni¬ 
tored  by  the  Office  of  Naval  Research  under  Contract  N00014-76C 
0144.  This  paper  is  in  part  from  a  Ph.D.  dissertation  submitted  by  C. 
Chamzas  to  the  Faculty  of  the  Polytechnic  Institute  of  New  York, 
Farmingdale,  NY. 

The  authors  are  with  the  Department  of  Electrical  Engineering, 
Polytechnic  Institute  of  New  York,  Farmingdale,  NY  1 1 735. 


(1) 

i-l 

in  terms  of  the  segment  (data) 

■/(f)  +  n(f)  lrl<r 

lo  ifi>r 

of  AO  containing  the  noise  component  fi(t).  The  data  are 
known  for  If)  <  T  only  for  a  variety  of  reasons; 

1)  The  signal  /'(t)  can  be  written  as  a  sum  of  exponentials 
for  a  limited  time  only  (voice;  nonstationary  processes). 

2)  The  available  time  of  observation  is  limited  (sun  spots; 
weather  trends). 

3)  Measurements  are  limited  by  instruuienl  constraints 
(Michelson  interferometer;  diffraction-limited  imaging). 
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Fig.  1.  (a)  The  unknown  signal  /{/)  and  its  Fouriet  transform  F(^). 
(b)  First  itetalion  starting  with  known  segment  W|(r). 


Fig.  2.  nth  iteration. 


The  unknown  frequencies  oj,  and  coefficients  c,  can  be 
determined  simply  with  ordinary  Fourier  transforms  if  the 
time  of  observation  2T  is  large  compared  to  all  the  periods- 
r,-  =  2nlu)j  and  their  differences.  This  is  not,  however,  the 
case  if  T  is  of  the  order  T,-  -  Tj,  particularly  if  the  noise  com¬ 
ponent  nU)  is  not  negligible.  In  this  paper  we  present  a 
method  which,  as  we  hope  to  show,  is  reliable  even  if  T  is 
small  and  the  data  are  noisy. 

The  method  involves  only  FFT  and  it  is  based  on  earlier 
results  dealing  with  the  problem  of  extrapolating  band-limited 
functions  [IJ,  (2).  We  review  (for  easy  reference)  the  relevant 
parts  of  these  results. 

II.  Extrapolation  of  Band-Limited  Functions 
Consider  a  function  /(f)  with  the  Fourier  transform  F(w) 
such  that 


F,(o3)  = 


|wl  >  a. 


(5) 


We  compute  the  inverse  transform  /,  (f)  of  F|(w),  and  form 
the  function 


W2(f)  = 


»v,(f)=/(f) 

fiU) 


|f|<F 

U1>F 


(6) 


and  its  Fourier  transform  H'zfw). 

This  completes  the  First  Step  of  the  iteration  (Fig.  1). 
nth  Step:  We  form  the  function  (Fig.  2) 


F„(w)  = 


W„(w) 

i 

.0 


|w|  <  a 
|cj|  >  o 


(7) 


F(w)  =  0  |u;|>a. 

We  form  the  function 


w,(f)  = 


7(f) 

4 

.0 


|fl<r 

|f|>F 


(3) 

(4) 


obtained  by  truncating  /(f)  as  in  Fig.  1.  We  shall  determine 
/(f)  in  terms  of  H'i(f)  by  numerical  iteration. 

First  Step:  We  compute  the  Fourier  transform  W,(cj)  of 
Wiff)  and  form  the  function 


where  Wnfcu)  is  the  function  obtained  at  the  end  of  the  pre¬ 
ceding  step  and  compute  the  inverse  transform  /„(f)  of  F„(cj). 
We  form  .the  function 


'Vrt*l(f)  = 


m 

fnU) 


ifi<r 

ifi>r 


(8) 


and  compute  its  Fourier  transform  h'„»,(a)). 

If  /(f)  is  approximated  by  /„(f),  the  resulting  mean-square 
error  is  given  by 
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=  iF(cj)  -  Ju;. 

(9) 

We  maintain  that  this  error  decreases  twice  at  eaclt  iteration 
step.  Indeed. 

£'«=/'  l/(f)-/nU)l- df+  1  (/{f)-/„(f)l= 

But  (see  (7)  and  (8)1 


f  lf(n-/n(r)l'  dt=  f  (/(f)  -  W„»,(f)l=  Jt 

=  :^r  IF(lj)- lV„,,(cu)l= 

—  oo 

=  f  |F(w)  -  hf„*,(cj)|‘ 

“tt  Vljjl  >  g 

1  r" 

+  —  I  |F(u))  -  F„.n(cj)|*  t/cj. 

And  since  the  last  integral  (see  (9)] ,  we  obtain 

=  l  l/(f)-/r,(0]- dt 

•'ifi  <  r 

+  |lV„.,(G;)|'dw  (10) 

^  IcJl  >  0 

because  F{u>)  =  0  for  I  col  >  a. 

In  (1)  and  (2)  we  show  that  /„(f)—/(f)  as  «-»«>.  This  is 
not  true  if  the  given  segment  w’|(f)  of  /(f)  is  noisy  as  in  (2). 
In  this  case,  a  satisfactory  estimate  of  /(f)  can  be  found  by 
early  termination  of  the  iteration  (2]. 

jVote:  From  (10)  it  follows  that  the  mean-square  error 
is  a  monoton  decreasing  function  and  since  it  is  positive  it 
tends  to  a  limit.  This  does  not  prove  the  convergence  of  (9) 
because  the  limit  need  not  be  zero.  It  shows,  however,  that 

-*0  ff--. 

Hence. 


im-/„it)l^dt^0  ft-' 


(11) 


M/l<  T 


Although  the  functions /(f)  and/„(f)  are  band  limited,  (11) 
does  not  imply  that  /(f)-*/rt(f)  because  there  is  no  lower 
bound  on  the  energy  concentration  of  band-limited  functions 
in  a  finite  interval  (1),  (3).  For  example,  the  prolate  spheroi¬ 
dal  functions  ^„(t)  are  band  limited;  their  energy  equals  one 
but  their  energy  concentration  in  the  interval  i-T.T)  tends 
to  zero  as  This  is  the  case  because  the  eigenvalues 

of  the  underlying  integral  equation  tend  to  zero  as  n  -►  <>». 

We  mention  without  elaboration  that,  in  the  discrete  version 
of  the  problem,  the  convergence  of  the  iteration  can  be  de¬ 
duced  from  (11)  under  suitable  conditions.  The  reason  is 


1  Fllill 


_ _ i.J-1  _ _ _ 

0  I  ^  uj 

i 

Fig.  3.  Fourier  iransform  oi  the  unknown  signal. 


iWntwlM 


1 00 


Fig.  4.  Truncation  of  IV„(uJi  below  a  threshold  level  yielding 


that  the  corresponding  eigenvalues  are  finitely  many,  there¬ 
fore,  they  have  a  positive  minimum  (4) . 


111.  Adaptive  Extrapolation 
The  preceding  method  was  based  on  the  assumption  that 
the  unknown  function  7(f)  is  band  limited.  This  informa¬ 
tion  was  used  to  reduce  the  error  in  the  estimation  of  /(f) 
twice  at  each  iteration  step.  The  speed  of  iteration  can  be 
increased  and  the  effects  of  noise  can  be  reduced  if  addi¬ 
tional  a  priori  information  about  /(f)  is  available.  Suppose, 
for  example,  that  the  size  of  the  band  of  F(w)  is  known  but 
its  precise  location  is  unknown.  We  then  choose  a  constant 
a  such  that  F(u))  vanishes  outside  the  integral  (-o.  o)  and 
proceed  as  in  Section  11.  As  the  iteration  progresses,  the 
form  of  W„(w)  suggests  appropriate  reduction  of  the  as¬ 
sumed  band  of /(f). 

The  adaptive  extrapolation  method  is  particularly  effect?ve 
if  /(/)  is  a  sum  of  exponentials  as  in  (1).  In  this  case.  F(a>) 
consists  of  impulses  (lines  I  as  in  Fig.  3: 


F(u)  =  27t  23  C(5(w  -  w,). 

I  ■  I 


(12) 


and  our  problem  is  to  determine  their  locations  w,  and  ampli¬ 
tudes  c,  in  terms  of  the  known  segment  Wi  (f)  of  /(f). 

To  solve  this  problem,  we  select  a  constant  o  larger  than 
the  largest  possible  value  of  w,-  and  we  proceed  with  the 
iteration  until  lF„(c>;)  takes  significant  values  only  in  a  sub¬ 
set  B„  of  the  band  {-a.  a)  of /(f)  (Fig.  4).  This  suggests  that 
the  unknown  frequencies  are  in  B„.  When  this  is  observed, 
the  function  F„(u))  of  the  nth  iteration  step  is  obtained  from 
the  following  modification  of  (7) 


F„(w)  = 


lo 


(13) 


(Fig.  4)  where  B„  is  the  complement  of  B„.  The  process  is 
repeated  with  occasional  reduction  of  the  size  of  B„  as  further 
evidence  suggests,  and  it  terminates  when  w„(f)  is  essentially 
a  sum  of  exponentials.  Another  application  of  the  method 
is  discussed  in  (5J  in  the  context  of  deconvolution. 
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Discussion 

The  adaptive  extrapolation  method  is  essentially  empirical. 
Although,  as  we  see  in  the  following  examples,  it  works  well 
in  a  number  of  cases,  there  is  no  a  priori  certainty  that  in  a 
given  problem  it  will  converge  to  the  unknown  signal.  In  fact, 
if  some  of  the  components  c,  of  /(f)  are  relatively  small,  they 
might  be  lost. 

The  accuracy  and  reliability  of  the  method  depends  on  a 
number  of  parameters:  total  number  of  unknown  frequencies, 
possibly  prior  knowledge  of  this  number,  relative  sizes  of  am¬ 
plitudes  c,  and  frequencies  w,-.  noise  level,  length  2T  of  the 
data  interval,  and  available  FFT  size  N.  A  precise  statement, 
even  empirical,  of  the  importance  of  all  these  factors  cannot 
be  made:  it  would  depend  on  many  parameters.  We  are  in 
the  process  of  determining,  empirically,  the  limits  of  the 
method  for  a  number  of  special  cases.  We  comment  below, 
briefly,  on  certain  empirical  criteria  for  selecting  the  set  B„ 
and  on  the  limitations  due  to  sampling. 

For  the  subset  B„  introduced  in  (13)  we  select  the  set  of 
points  such  that  the  magnitude  of  exceeds  a  threshold 

level  e„ : 

|W„(co)|>6„ 

(14) 

lW„(w)|<e„ 

The  choice  of  €„  is  dictated  by  two  conflicting  require¬ 
ments:  for  a  speedy  convergence  and  noise  reduction,  must 
be  large;  it  must  be  sufficiently  small  so  that  all  frequency 
components  of  /(f)  femain  in  B„.  In  the  examples  given  be¬ 
low  we  used  the  following  method  for  determining  e„. 

We  first  find  the  minimum  Afn-i  ot  lh'n-i(w)l  in  the  set 

Af„-,  =  min  IWn.ifw)!  (15) 

If  is  greater  than  where  q  is  a  constant  less 

than  one.  then  we  do  not  change  the  threshold  level.  If  e„., 
is  less  than  pM„ . ,  then  we  choose  e„  =  pM„ . , .  Thus. 

e„  =max{e„-,.qA/„.,}.  (16) 

In  the  examples,  q  is  chosen  between  0.9  and  0.99. 
Numerical  Considerations 

The  numerical  implementation  of  the  method  involves  the 
discrete  signals 

fn^f(nto)  F„=F(nUo) 

obtained  by  sampling/(f)  and  F{u). 

Suppose,  first,  that  the  problem  is  inherently  discrete,  i.e., 
that  we  wish  to  find  the  spectrum  of  a  sequence  f„  from  in¬ 
complete  data.  Clearly,  the  discrete  version  of  the  iteration 
and  of  the  band-limited  assumption  are  self-evident.  How¬ 
ever.  the  assumption  that  /(f)  is  a  sum  of  sine  waves  has  no 
obvious  discrete  version.  It  corresponds,  loosely,  to  the  as¬ 
sumption  that  the  smallest  distance  of  the  nonzero  frequencies 
is  large  compared  to  one  (no  “neighboring  frequencies”  are 
present).  If  this  is  the  case,  then  the  unknown  frequencies 
can  be  determined  exactly,  provided  that  the  data  interval  is 
not  too  small  and  the  noise  level  is  reasonable. 
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Fig.  5.  Discrete  spectrum  \F„\  of  =  0.3. .V  =  256). 


We  turn  now  to  our  main  problem:  the  numerical  deter¬ 
mination  of  the  frequencies  of  an  analog  signal.  We  assume 
that  the  FFT  size  N  is  specified.  It  suffices,  therefore,  to 
select  the  size  f^  of  the  sampling  interval.  As  we  know  [1], 
the  frequency  interval  is  then  determined  because  Wg  = 
InlNto-  Since  the  data  interval  is  2T.  the  number  M  of 
available  samples  equals  ITjtg.  The  choice  of  M  is  guided 
by  the  following  considerations:  if  M«N.  then  the  itera¬ 
tion  might  converge  to  the  wrong  frequencies.  If  .W  is  large, 
then  the  aliasing  errors  are  large. 

It  appears  from  our  experience  that  M  =  NjA  is  a  reason¬ 
able  compromise  and  it  leads  to  tg  —  ST/W.  However,  as  we 
shall  see,  to  increase  the  resolution  we  might  use  a  larger 
value  for  tg. 

The  accuracy  of  the  method  and  the  attainable  resolution 
depend  on  the  relationship  between  the  unknown  frequencies 
w,  and  the  sampling  frequency  iOg.  If  all  unknown  frequencies 
are  multiples  of  <jjg 

CO,-  =  r.-cjo 

then  the  problem  is  essentially  discrete.  If  the  unknown  fre¬ 
quencies  and  their  differences  are  large  compared  touiQ,  then 
the  error  is  smaU  because  it  is  of  the  order  of  u>g. 

The  problem  of  determining  co,-  is  difficult  if  is  of  the 
order  of  to,-,  and  w,-  is  not  an  integer  multiple  of  cJq 

<0(  =  (r,- +  a)  (Oo  |a|<5. 

In  this  case,  the  resolution  error  o)gl2  is  of  the  order  of  to,-. 
Furthermore,  aliasing  generates  spurious  frequencies  in  the 
vicinity  of  co,-.  Indeed,  if 

/(r)  =  e'‘^'" 

then 


yielding  the  discrete  spectrum  (Fig.  5) 


N-\ 


I  -  w 


(m  -  r,'  -a)<V 


1  -  W 


(m  - 


To  improve  the  accuracy,  we  can  repeat  the  process  with  a 
larger  value  of  tg.  using  as  starting  B„  the  set  containing  only 
the  estimated  frequencies  w,  and  their  neighbors. 

IV.  Illustrations 

We  illustrate  the  method  with  several  examples  involving 
signals  whose  unknown  frequencies  cannot  be  determined 
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l(»«l  t(Ml) 

Fig.  6.  Given  segment  ivi  (t)  and  its  Founer  transform  IV,  iw). 


Fig.  7.  Result  of  the  iteration  for  n  ^  20  and  n  ^  70. 


ilitcl  I (sec) 

Fig.8.  Given  data  segment  for  i/A' »  15,  ll,and5dB. 


from  direct  Fourier  analysis.  In  these  illustrations  we  con¬ 
sider  several  noise  levels.  With 

w,  (/)=/(  f)  +  /i(0 

the  given  data,  we  define  the  signal- to -noise  ratio  S/^  as  the 
ratio  of  the  energies  of/(/)  and  n(t)  in  the  data  interval.  In 
all  examples,  the  noise  is  white  and  is  uniformly  distributed 
in  the  interval  (-c  to  c).  The  ratio  S/JV  is  changed  by  changing 
the  size  of  c. 

The  computations  are  carried  out  with 

fV=256  /<,=  lHz  to  =  1/256  s. 

To  avoid  large  scaling  factors,  we  divided  all  frequency  com¬ 
ponents  by  jV/2.  In  the  examples  we  show  also  the  value  of 
the  parameter  fx  [see  (16))  and  of  the  initial  threshold  level 

€l- 

Example  I:  The  unknown  signal  is  a  sum  of  two  sine  waves 

f(t)  =  1 .5  cos  OOrrr  +  60“)  +  1 .25  cos  ( 20jrr  +  30°) 

and  the  unknown  frequencies  /,  =  10  Hz  and  /i  =  15  Hz  are 
integral  multiples  of  the  sampling  frequency  oJo' 
a)  We  first  assume  that  the  data  interval  contains  Af  =  51 
sampling  points  and  n(f)  0. 


In  Fig.  6  we  show  the  given  segment  of  the  unknown  signal 
and  its  spectrum.  As  we  see  from  the  figure,  the  frequencies 
f\  and  /i  are  not  visible.  The  initial  threshold  is  e,  =0.15  and 
its  value  at  the  nth  iteration  is  obtained  from  (16)  with  //  = 
0.99.  In  Fig.  7  we  show  the  results  of  the  iteration  for  n  =  20 
and  ft  =  70.  At  the  70th  iteration  the  frequencies,  amplitudes, 
and  phases  of f(r)  are  recovered  exactly. 

We  note  that,  in  this  case,  the  values  of  e,  and  p  are  not 
critical.  Any  value  of  p  between  0.9  and  0.99  and  of  c,  be¬ 
tween  0.05  and  0.15  is  adequate.  The  iteration  was  per¬ 
formed  also  with  a  data  interval  containing  Af  =  4 1  sampling 
points.  In  this  case,  the  results  are  similar  but  the  speed  of 
convergence  is  slower. 

b)  We  consider,  next,  noisy  data  with  various  S/N  ratios  as 
in  Fig.  8.  In  all  cases, 

Af  =  51  ju  =  0.99  €,=0.15. 

The  iteration  was  performed  several  times  with  the  same 
signal  but  with  different  samples  of  noise.  As  the  following 
indicates,  the  results  are  not  the  same  for  all  samples:  S/N  = 
15  dB  (c  =  0.375).  Six  samples  were  tried.  In  five  of  these, 
the  frequencies/,  and /j  were  found  exactly.  S//V=  II  dB 
(c  =  0.625).  Fourteen  samples  were  tried.  In  nine,  we  ob- 
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(a)  (b) 

Fig.  9.  (a)  Given  segment  of  /(f).  ib)  Fourier  transform  of  n  j  (f). 


f  r 


I  SO-f- 


hioi*'!;' 


n 


h,oo'"l 


0  —  40  0  - -  40 

t(H2l  MHtl 


(a)  (b) 

Fig.  10  Result  of  the  iteration  for  n  =  30  and  n  -  100. 


tained  /,  and/:  exactly.  In  four  cases,  an  error  of  1  Hz  de¬ 
veloped.  In  one  case,  the  iteration  yielded  not  two  but  three 
frequencies:  /,  =  9  Hz.  /;  =  14  Hz.  and  /a  =  15  Hz.  S/N  =  5 
dB  (c  =  1.25).  This  is  a  very  noisy  case.  Of  the  eleven  samples 
tried,  three  gave  the  correct  answer,  two  yielded  1  Hz  error, 
five  resulted  in  2  Hz  error,  and  in  one  case  the  frequency  /j  = 
15  Hz  was  lost. 

Example  2:  In  this  example  fit)  consists  of  three  sine  waves 
and  the  data  are  noiseless.  We  consider  two  cases.  In  the  first 
case,  the  unknown  frequencies  are  multiples  of  Wg.  In  the 
second  case,  they  are  not. 

a) 

fU)  =  1.5  cos  4ffr  +  1.5  cos  (18irf  +  60°) 

+  1.25cos(:87rf +  30°). 

We  start  with  the  following  values  of  the  relevant  parameters; 

/W  =  59  q  =  0.95  e,  =  0.20. 

In  Fig.  9  we  plot  the  given  segment  /(/)  and  its  spectrum. 
Fig.  10  shows  the  results  of  the  iteration  for  n  =  30  and  n  = 
100.  At  the  100th  iteration  the  frequencies,  amplitudes, 
and  phases  of f(t)  are  recovered  exactly. 

Again  the  values  of  p  and  ei  are  not  critical.  Essentially 
the  same  results  are  obtained  if  the  data  interval  is  reduced 
to  Af  =  5 1  provided  that  p  is  not  less  than  0.95 . 

The  method  has  been  tried  also  for  a  smaller  data  interval. 
However,  the  convergence  is  slow  and  the  result  inaccurate. 
With  A/ =  41,  p  =  0.99,  €i  =0.20  the  component  with  the 
lowest  frequency  is  lost. 

b) 

fir)  =  1 .5  cos  4.8ff/  +  1 .5  cos  ( ISrrf  +  60°) 

+  1.25  cos(29.2f  +  30°). 


In  this  case. 

/,  =  (2  +  0.4)  /„  /:  =  9/„  /:  =  ( 1 4  +  0.61  /„ . 

We  used  .W  =  59.  p  =  0.95 .  and  ei  =0.20. 

With  an  FFT  size  A'  =  256.  we  obtained  after  350  iteration 
steps  the  frequencies  2  Hz.  9  Hz.  and  15  Hz  (Fig,  1  Ic). 

Increasing  the  FFT  size  to.V=512,  we  found  in  200  steps 
the  frequencies  2.5  Hz.  8.75  Hz.  9,25  Hz.  and  14.5  Hz.  (Fig. 
lid). 

We  note  that  the  accuracy  in  the  evaluation  of  coefficients 
of  different  levels  can  be  improved  if  the  threshold  level  e„ 
is  not  constant  through  the  band  but  it  takes  different  values 
in  the  vicinity  of  each  frequency.  This  is  demonstrated  in 
the  next  example. 

Example  3:  The  unknown  signal  is  a  sum  of  five  sine  waves. 
fit)  =  1.5  cos  47rr  +  1 ,25  cos  ( Mut  +  30°) 

+  0.375  cos  (407rr  +  60°)  +  0,625  cos  507rr 
+  1.25  cos(60jrr  +  45°) 

with  frequencies  2.  6.  20,  25.  and  30  Hz;  the  noise  is  zero. 
In  Fig.  12  we  show  the  given  data,  obtained  with  A/  =  71.  and 
their  spectrum.  In  the  iteration  we  assume  that  p  =  0.99  and 
€i  =  0.04.  The  level  of  the  threshold  level  at  the  nth  iteration 
is  defined  as  in  ( 16).  However,  it  is  not  constant  throughout 
the  band.  Its  value  is  determined  from  the  behavior  of  H'„(u)) 
in  the  vicinity  of  each  maximum  (Fig.  13). 

In  Fig.  13  we  show  the  iteration  for  n  =  10  and  n  =  20.  At 
the  50th  step.  (Fig.  14)  we  recover  the  frequencies  2.  6.  25. 
and  30.  As  it  is  clear  from  the  figure.  )V„(co)  contains  a  peak 
in  the  vicinity  of /=  20.  To  determine  its  exact  location  we 
introduce  the  following  variation  to  the  method,  we  .subtract 
from  the  given  data  the  recovered  portion  of  fit)  and  repeat 
the  iteration  starting  with  the  new  data  31 1)  so  obtained. 
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0  20  - -  40 

KMl) 

lb) 


MHj) 

(c) 


HMl) 


(d) 

Fig.  11.  (a)  Given  segment  wilf).  (b)  Fourier  transform  of  >vi(r). 
(c)  Result  of  the  iteration  forn  =  350  and  FFT  tizeN  =  256.  (d)  Re¬ 
sult  of  the  iteration  for  n  =  200  and  FFT  size  iV  =  512. 

In  Fig,  15  we  show  d(t)  and  its  spectrum  Z)(w).  The  unknown 
frequency /=  20  is  recovered  at  the  20th  step  (Fig.  16). 

The  iteration  was  performed  also  with  a  smaller  data  seg¬ 
ment  (A/ =  61).  The  results,  however,  were  similar  but  the 
convergence  slower. 

Example  4:  To  test  the  limits  of  the  method,  we  consider 
as  a  last  case  an  example  where  the  data  interval  is  less  than 
one-half  the  unknown  period,  and  the  unknown  frequency  is 


HMJ) 


(b) 

Fig.  12.  (a)  Given  segment  W]  (f).  (b)  Fourier  transform  of  wj  (r). 


0  20  - -  40 

MMr) 


0  20  -  40 

1 1  Mr) 


Fig.  13.  Result  of  the  iteration  for  n  =  10  and  n  =  20. 

not  a  multiple  of  Ug  so  that  the  aliasing  is  significant.  We  as¬ 
sume  that 

/(f)=  1.25cos(5.47rf  +  30“)  7"=  0.08  s. 

This  yields  Af  =  41  sampling  points  in  the  data  interval. 

The  iteration  was  performed  with  p  =  0.99  and  ei  =0.05. 
We  considered  four  different  signal  levels  (Fig.  1 7). 
a)  ri(f)  =  0.  At  the  40th  iteration  we  recover  the  frequency 
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0  EO  —  40 


1  IH«) 

(b) 

Fig.  15.  (a)  New  data  segment  cl(r).  (b)  Fourier  transform  of 


0  EO  - -  40 

MHr) 

Fig.  16.  Result  of  the  iteration  for  n  =  20. 


i(nc) 

(d) 


Fig.  17.  Given  data  segment  wi(r)  for  various  noise  levels,  (a)  S/.V  - 
22  dB.  (b)  5/N  =  12  dB.  (c)  S4V  =  8  dB.  (d)  5/^  =  2  dB. 


/=  3  Hz.  This  is  the  nearest  sampling  frequency  to  the  un-  quency  and  its  two  neighbors  /=  2.5  and  /=  3.5.  After  n  = 
known  /i  *2.7.  However,  since  the  resolution  frequency  150  steps,  we  recovered  the  frequency/*  2.5  (nearest  to  the 
/o  =  1  is  of  the  order  of  /j,  the  error  is  large.  To  reduce  it,  unknown /i  =  2.7). 

we  increase  the  sampling  interval  from  =  1/256  to  “  Th*  process  was  repeated  with  -  1/64,  that  is.  for  /„  = 
1/128.  This  yields  /„  =  1/2  Hz  but  the  number  of  sampling  1/4  and  M  *  1 1  sampling  points.  The  iteration  yielded  two 
points  is  reduced  to  Af  =  21.  The  iteration  starts  from  the  frequencies:  /,  =  2.5  and  ft,  =  2.15  with  amplitudes  lcjl  = 
band  Bg  consisting  of  the  location  /*  3  of  the  recovered  fre-  0.616.  lcj,|  =  0.630  and  phases  =  30.06° .  ~  31.44°,  re- 
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Fig.  18.  Wlit)  =  1.25  cos  (5.4ir/  +  30°)  +  fi(f)  |f|  <  0.08  s.  (a)  Fourier  transform  of  W)(r).  (b)  Result  of  the  iteration 
for4,=  l  Hz  W  =  41)  and  n  =  100.  /(/)=  1.38  cos  (6)rr  +  25.5°).  (c)  Result  of  the  iteration  for /o  =  0.5  Hz  W  =  21) 
and  n  *  200.  f(t)  «  1.23  cos  (5itr  +  31.8°).  (d)  Result  of  the  iteration  for  =  0.25  Hz  (5f  »  11)  and  n  =  200.  f(t)  = 
0.62  cos  (5»r  +  30.1°)  +  0.63  cos  (S.SjtI  +  31.4°)  ~  1.246  cos  (2.63irf  +  30.7°). 


spectively.  The  location  /  and  amplitude  c  of  the  unknown  In  Fig.  18  we  show  the  results  for  5/N=  8  dB  and  /o  =  1, 
frequencies  was  finally  estimated  by  interpolation,  yielding  0.5, 0.25  Hz. 


hfa* 


kal  +  Icel 


fo 


-  2.63  Hz 


c  =  Q  +  C*  =  1.246L30.76°. 


b)  n(f)#0.  We  considered  four  different  signal-to-noise 
ratios.  All  cases  were  performed  20  times  using  different 
samples  for  the  noise.  The  statistical  conclusions  are  de¬ 
scribed  in  Table  I.  The  numbers  in  the  table  are  the  esti¬ 
mates  in  Hz  of  the  unknown  frequency.  Numbers  in  paren¬ 
theses  indicate  in  how  many  samples  out  of  20  the  cited 
estimates  were  obtained. 
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3.  Undersampled  Data. 

The  problem  of  estimating  the  spectrum  S((jj)  of  a  signal  from  undersampled 
data  was  considered.  We  showed  that,  although  it  is  not  possible  in  general 
to  recover  reliably  S(uj),  in  special  cases  adequate  estimates  are  possible. 

The  results  were  presented  in  the  following  paper: 


"Spectral  Estimation  from  Random  Samples" 

IEEE  International  Conference  on  Information  Sciences  and  Systems, 
Patras,  Greece,  1979. 


4.  Spectral  Estimation. 

The  fundamental  problem  of  estimating  the  spectrum  S(to)  of  a  random  signal 
in  terms  of  a  single  realization  was  considered  with  emphasis  on  the  method  of 
Maximum  Entropy.  Recent  results  led  to  the  following  two  papers: 


"Entropy:  From  first  Principles  to  Spectral  Estimation" 

IEEE  Tr-ASSP  Workshop  on  Spectral  Estimation,  Hamilton,  Ontario,  1981. 


"Maximum  Entropy  and  Spectral  Estimation" 
IEEE  Tr-ASSP  (to  appear). 
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ABSTRACT 

The  method  of  maximum  entropy  is  reviewed  with  emphasis  on  its 
relationship  to  entropy  rate,  Wiener  filters,  autoregressive  processes, 
extrapolation,  the  Levinson  algorithm,  lattice,  all-pole,  and  all-pass  filters 
and  stability. 
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I.  Introduction 

In  the  last  decade,  several  papers  have  been  published  discussing  a 
method  of  spectral  estimation  based  on  the  principle  of  maximum  entropy  [  1  3  -  . 

[43  and  the  relationship  of  this  method  to  entropy  rate  [53,  the  Wiener  theory 
of  prediction  [63.  [73  autoregressive  processes,  the  Levinson  algorithm  [83, 
lattice  filters  [9],  all-pole  and  all-pass  filters,  and  stability.  However,  it 
appears  that  no  single  publication  in  the  open  literature  explains  simply  the 
interconnection  of  these  topics.  The  purpose  of  this  paper  is  an  attempt  to  do 
so  starting  from  first  principles  [l03.  The  effectiveness  of  the  method  in 
the  solution  of  specific  problems  will  not  be  considered  here.  In  the  Appendix, 
we  comment  briefly  on  its  conceptual  justification.  The  material  is  developed 
with  some  originality;  however,  the  paper  is  essentially  tutorial. 

The  entire  development  is  based  on  the  orthogonality  principle  [  11  3  : 

In  the  estimation  of  a  random  variable  y  by  a  linear  combination 

y  =  a^Xj  +  .  ..  +  aj^Xj^  (1) 

of  the  N  random  variables  x^,  .  . .  ,  Xj^  (data),  the  MS  error 

P=E{(y-y)^}  (Z) 

is  minimum  if  the  estimation  error 

e  =  y-y  (3) 

is  orthogonal  to  the  data  Xj^,  that  is,  if 

E{exj^3  =  0  k  =  1,  . . .  ,  N  (4) 

The  resulting  MS  error  P  is  then  given  by 

P=  E(e^3=  E{ey}  (5) 

We  state  also  for  later  use  the  following  results  from  the  theory  of 
linear  systems  with  stochastic  inputs  [l  l3:  Suppose  that  the  input  to  a  dis¬ 
crete  linear  system  is  a  stationary  process  x[n3  with  autocorrelation 


-I- 
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R^x  t”’  3  =  E[x[n  +  m  3  X  [n  ]} 


(6) 


and  power  spectrum 


oo 


S  (z) 

XX 


=  Z 

m=- 


R  [m  3  z 

XX 


-m 


(7) 


If  h[n3  is  the  delta  response  and  H(z)  the  system  function  of  the  system,  then 
the  power  spectrum  of  the  resulting  output  y[n3  =  x[n3*h[n3  is  given  by 

Syy(z)  =  Sxx(z)H(z)H(l/z)  (8) 

In  the  above  we  assumed,  as  we  shall  throughout  the  paper,  that  all 
processes  and  systems  are  real.  With  trivial  modifications,  the  results  hold 
also  for  complex  processes.^The  spectral  estimation  problem  has  two  parts: 

1.  Deterministic  Estimate  the  power  spectrum  S(z)  of  a  process  s[n3 
in  terms  of  the  N  +  1  values  R[03,  R[13»-**»R  [N  3  of  its  autocorrelation. 

2.  Random  Estimate  the  power  spectrum  S(z)  of  a  process  s[n3  in 
terms  of  the  values  s[l3>  s[23,.*»,s  [N^  3  of  a  single  realization  of 

s[n3. 

As  we  show  in  the  paper,  the  maximum  entropy  solution  of  Part  1  can 
be  presented  as  a  recursive  modification  of  the  Wiener  prediction  filter.  The 
modification  is  based  on  the  Levinson  algorithm  expressed  in  terms  of  forward 
and  backward  predictors.  The  solution  of  Part  2  is  given  by  an  estimator  whose 
various  parameters  satisfy  the  same  equations  as  in  the  deterministic  case, 
with  the  only  difference  that  in  the  evaluation  of  the  recursion  coefficient 

Pj^Csee  (53)3i  all  ensemble  averages  are  replaced  by  suitable  time-averages. 

i 

► 
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II.  Prediction 

We  wish  to  estimate  the  future  value  s[n  +  l]  of  a  random  signal  s[n3 
in  terms  of  the  sum 

N 

[n+ 1  ]  =  a^s  [n]  +  •  •  •  + aj^  s  [n -N+ 1  3  =  ^  a^s[n-k+l3  (9) 

k=l 

involving  its  N  most  recent  values  s  [n  -  k3.  The  set  of  weights  a^  that  mini¬ 
mizes  the  MS  value  of  the  prediction  error  defines  a  FIR  filter  of  order  N 

(Fig.  1)  called  the  forward  predictor  (one-step)  of  s[n3.  The  superscript  N 
N 

in  aj^  specifies  the  order  of  the  predictor.  Since  s  [n3  is  stationary,-  the 
N 

optimum  weights  a^^  are  independent  of  n.  We  can  give,  therefore,  to  the 
variable  n  in  (9)  any  value.  With  , 

ej^[n3  =  s[n3  -  (10) 


the  forward  predictor  error,  we  have  [see  (4)3 


E  Cn3  s  [n  -  k3  1  =  0  l<k^N 


This  yields  the  system 

R[03a^  +  R[l3a^+...+  R[N-  l3a^  =  R[i3 
R[l]a^  +  R[0  3a^+...+  R[N-23a^  =R[23 


R[N-  l3a^  +  RCN-Zja^  +  ...  +  RCo3a^  =  RCN] 

N 

expressing  the  predictor  coefficients  aj^  in  terms  of  the  N  +  1  values  R[0  3  ,  •  . .  , 
R[N3  of  the  autocorrelation  R[m3  of  s[n3.  In  the  next  section,  we  discuss 
a  recursion  method  for  solving  this  system. 

Applying  (5)  to  our  estimator,  we  conclude  that  the  MS  estimation  error 

A 

is  given  by 


-3- 
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=  Cn]s  [nil  =  R[0]  -  ^  a^R[k] 


Note:  Consider  two  processes  sCn]  and  s^Cn]  with  autocorrelations  R[m3 
and  R^[inl,  respectively.  From  (12)  it  follows  that,  if 

R[m]=R^[m3  for  Oj£m<N  (14) 

then  the  Nth  order  predictors  of  s[n3  and  s^[n3  are  identical.  Conversely, 
if  the  Nth  order  predictors  of  s  [n ]  and  [n  1  are  identical,  then  ( 1 2)  shows 
that  R [m3  =  cR^[m3  0<m  <N.  The  proportionality  constant  c  equals  1  if 
R[m3  and  R^[m3  satisfy  one  additional  equation,  for  example,  if  the  Nth  order 
MS  errors  are  equal  or  if  R^  [03  =  R[o3,  that  is ,  if  the  two  processes  have 


the  same  average  power 


Pq  =  E{s^Cn3} =  R[03 


The  prediction  error  [see  (10)3 


ej^.[n3  =  s[n3  -  J  a^s[n-k3 


is  the  output  of  a  system  (Fig.  1)  with  input  s[n3  and  system  function 


Hn<‘>  =  ‘  -  S 


N  -k 


This  system  will  be  called  the  forward  error  filter. 

The  backward  predictor.  We  shall  now  estimate  the  process  s[n3  in  terms 


of  the  backward  predictor 


~  S  ‘^k  ®  Cn+  k3 


involving  its  N  closest  future  values  s  [n  +  k3.  With 

[n3  =  s  [n3  -  [n3 


the  backward  predictor  error,  we  have  as  in  (11) 


E{ej^[n]s[n+kjJ  =  0  l<k<N  (19) 

This  yields  the  system 

N 

2  b^R[r  -  k]  =  R[kl  l<k<N 
r=l 

N  N 

which  is  identical  to  the  system  (12).  Hence,  =  a^^,  that  is,  the  backward 
predictor  of  stn]  is  the  sum 

Sj^  [n]  =  a^s  [n+ 1  ]  +  ...  + a^  s  [n  +  N  ]  .  (20) 

The  predictor  error  is  thus  given  bv 

N 

=  s[n]  -  ^  a^s[n  +  k]  (21) 

k=l 


Denoting  by  Pj^ 


In  other  words. 


its  MS  value,  we  conclude  as  in  (13)  that 

N 

=  E{e'j^Cn3s[n])  =  R[o3  -  J  a5^R[k3 

k=l 

the  forward  and  backward  MS  predictor  errors  are  equal: 


P  =  P  =  P 
N  N 


(22) 


Clearly,  e^Cn3  is  the  output  of  a  system  with  input  s[n3  and  system 
function 


N 


“n<*>  =  ‘  -  E  * 


N  k 
z 


(23) 


k=l 


This  system  will  be  called  the  backward  error  filter.  Comparing  with  (17), 
we  conclude  that 

Hj^(z)  =  Hj^(l/z)  (24) 


29 


In  the  above,  we  assumed  that  s[n]  is  a  real  process.  The  results 
hold  also  for  complex  processes  subject  to  the  following  modifications 

RC-m]=R*[m]  =  (a|^  )*  Hj^(2)  =  ( l/z’^) 

Autoregressive  processes.  An  autoregressive  process  (AR)  of  order  M  is  a 
random  signal  s[n]  satisfying  the  recursion  equation 

s[n]-CjS[n-l]-...  -  Cj^  sCn-M]=4Cn]  (25) 


where  4  is  stationary  white  noise  with 


6[tn] 


(26) 


From  the  definition  it  follows  that  s  [n3  is  the  output  of  a  linear  system  with 
input  ^[n]  and  system  function 

1 


T(2)  = 


M 

1  -z 

k=l 


c,  z 
k 


-k 


(27) 


If  this  system  is  stable,  then  s[n]  is  a  stationary  process  given  by 


00 


s  [n]=  2]  hCr]  ^  [n  -  r] 
r=0 


(28) 


where  h[n3  is  the  causal  inverse  transform  [l2]  of  H(z).  This  shows  that 
for  any  k  >  1,  the  random  variable  s  [n  -  k]  is  a  linear  combination  of  only 
the  past  values  of  ^  [n},  hence 

E  Cs  [n  -  k]  ^  [n3}  =  0  h>  1  (29) 

because  ^[nl  is  white  noise  by  assumption. 

We  maintain  that  the  predictor  Sj^[n3  of  s[nl  of  order  N  > M  is  the 

sum 

s  =  CjS  [n  -  1  3  +  •  •  •+Cj^s  [n -M]  (30) 
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(36) 


M 

R[N  1  =  ^  Cj^R [N  -  kl  for  every  N^M 
k=l 


Setting  N  =  M,  we  obtain  a  system  of  M  equations  expressing  the  M  coefficients 
in  terms  of  the  first  M  +  1  values  of  R[m3.  With  c^so  determined,  we 
can  use  (36)  to  evaluate  successively  RCN]  for  every  N  >  M.  Thus,  (36)  is 
an  extrapolation  formula  for  R  [N  ]  . 


III.  The  Levinson  Algorithm 

The  solution  of  the  system  (12)  involves  the  inversion  of  the  matrix 


R[0l 

R[l]  .  .  . 

R[N  - 

R[n 

R[0]  .  .  . 

R[N  - 

rCn  - 1] 

RCN  -2]  ... 

R[0] 

(37) 


This  matrix  has  a  special  form  (Toeplitz  [13])  and  can  be  inverted  easily  by 
a  simple  iteration  known  as  Levinson's  algorithm  [S].  We  shall  present  the 
result  as  a  recursion  involving  directly  the  predictor  coefficients. 

Theorem.  The  forward  predictor  Sj^Cnl  can  be  written  as  a  sum 

Sj^Cn]  =  [n]  +  r^^s  [n  -N]  -  Sj^_j  [n  -N]  )  (38) 


where 


N-1 

2  aj^"^s[n-k] 


k=l 


®N-1 


N-1 

2  a^’^s[n-N+k] 


k=l 


(39) 


are  the  forward  and  backward  predictors  of  s  [nT  and  s  [n  -  N^  respectively, 
and  the  coefficient  is  a  constant  to  be  determined.  Equation  (38)  can  be 
expressed  in  terms  of  the  forward  and  backward  predictor  errors 
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®N-1  ~  ®N-1  s[n-N]  -  s’j^_^  [n  -  N] 

Indeed,  subtracting  both  sides  from  s[n3.  we  obtain 

eNC«3  =  ej^.lCn]  -  Tj^eN.itn-N]  (40) 


It  suffices,  therefore,  to  prove  (40). 

Proof  by  induction.  Clearly  Sj^[nl  is  a  linear  combination  of  the  N  most 
recent  values  s[n-k]  of  the  signal.  It  suffices,  therefore,  to  show  that  e^^ 
satisfies  the  orthogonality  condition.  By  the  induction  hypothesis,  we  assume 
that  the  sequences  ^  andcj^  ^  are  the  predictor  errors  of  order  N  -  1, 
that  is,  [  see  (1 1)  and  (19)  ] 

E  {ej^  2Cnls[n-kll=0  1<^  k^N  -  1 

(41) 


EJbn  j  [n -N]  s  [n  -  kl}  =  0 

We  shall  show  that  if  is  given  by  (40),  then  it  is  the  Nth  order  predictor 
error.  As  we  know,  this  is  true  if 

E{ej^[n]  s  [n  -  k33  =  0  l<k<N  (42) 


From  (41)  it  follows  that  (42)  holds  for  l^k^N  -  1.  It  suffices,  therefore, 
to  select  such  as  to  satisfy  (42)  for  k  =  N:  E{ej^  [n]  s[n-N]}  =  0.  Insert¬ 
ing  (40)  into  the  above,  we  obtain 

E(ej^_l  [n]  sCn-N]}  =  Cn-N]  s[n-N]}  (43) 


and  since  [see  (39)] 

N-1 

E(^_lCn]s[n-N]}  =  E{(s[n]-  ^  aj^"^  s  [n  -  k]^  s  [n  -  N ]  | 

k=l 


=  R[N] 


N-I 


-  Z  aj^-^RCN-k] 

k=l 


and 
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(43)  yields 


33 

E{^j^_^Cn-N]  s[n.N]}=  Pj^_^ 

N-1 

^N-l^N  =  ■  E  a^‘^RCN-k]  (44) 

k=l 

N-1 

We  have,  thus,  expressed  in  terms  of  the  coefficients  aj^  of  the  predictor 
of  order  N-1  and  the  corresponding  MS  error  This  error  is  given  also 

by  [see  (22)  1 

^N-1  "  ^'■®N-1^”^  ^  ^ 

With  this  choice  of  Fj^,  (42)  holds  for  every  k  from  1  toN. 

N  N- 1 

Using  (38),  we  can  express  a^^  in  terms  of  a^^  and  the  constant  Fj^. 

Indeed,  with  Sj^[n3  as  in  (9),  we  obtain  equating  coefficients  of  both  sides  of 

(38)  the  recursive  equation 

N  N-1  _  N-1  ,  ^  ^ ,  N  „ 

“k“k  -Tn^^.I  l<k<N-l  (46) 

where  Fj^  is  determined  from  (44).  Since  this  equation  involves  the  MS  error 
^N-l'  to  complete  the  induction,  we  must  determine  its  Nth  order  value 

^N  "  ^^®N  ®  (47) 

Wetnaintain  that 

^N  ■  "  ^N^^N-1 

To  show  this,  we  insert  ej^[n3»  as  given  by  (40),  into  (47)  and  use  (45)  and 
the  fact  that 

N-1 

E{ejq_^[n-N3s[n]}  =  E{(s[n-N]  -  ^  a|^"^  s[n-N  +  k3)s[n3} 

N-1  k=l 

=  R[N3  -  S  a^-^RCN-  k3=  Fj^Pn.i 

k=l 
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V 


34. 


2 

The  result  is  Equation  (48).  The  induction  starts  with  Tg  =  0,  Pg  =E{s  [n3}  = 
R[0l  and  for  N  =  1  it  yields  =  R[l],  P^  =  Pg(l  -  T^). 

The  recursion  (38)  and  its  equivalent  (40)  hold  also  for  the  backward 

f 

predictors.  Reasoning  similarly,  we  obtain 

Sj^[n  -N]  =  Sj^_j  [n-N]  +  [n]  -  Cn]  )  (49) 

e^jCn-N]  =  ej^_^Cn-N]  -  (50) 

Lattice.  Equations  (40)  and  (50)  can  be  given  the  following  graphical  interpre¬ 
tation  C9li  [l4l.  In  Fig.  2,  we  show  N  lattice  sections  connected  in  cascade. 
Each  section  consists  of  one  delay  element  and  two  multipliers.  The  input  to 
the  system  so  formed  equals  s[n]  =  e^[n3  =  e^Cn]  and  the  two  outputs  equal 
the  forward  and  backward  predictor  errors. 

We  note  that  the  transfer  functions  from  the  input  A  to  the  two  outputs 
B  and  C  equal  Hj^(2)  and 

=  z"^Hj^(l/z) 

^  V/ 

respectively,  where  Hj^(z)  is  the  forward  ecror  filter  and  Hj^(z)  is  the  back¬ 
ward  error  filter. 

Note:  We  derive  next  for  later  use  a  modified  form  of  (44).  Clearly,  [see 

(41)3, 

^N  =  ECe2[n3}  =  E{(ej^_jCn3 

Since  the  coefficients  of  the  predictor  minimize  Pj^,  we  must  have 
=  0  =  E{-2(^_jCn3  -rj^ej^.l[n-N3)ej^_^Cn-N3} 

Hence, 
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35. 


The  above  can  be  written  in  the  symmetrical  form 


» 


(53) 


This  is  a  consequence  of  the  fact  that  the  forward  and  backward  MS  errors  are 
equal.  It  can  also  be  derived  by  writing  in  the  symmetrical  form 

j  E  [n]  +  ej^  [n -N  ]  }  "  "2  ^N- 1  ^ 


(54) 


and  minimizing  with  respect  to  Fj^. 

Stability.  We  have  shown  that  the  Nth  order  MS  error  is  given  by  Pj^  = 


(1  -  Fj^  )  Since  P^^  <  it  follows  that 


|r„l<i 


(55) 


with  equality  iff  Pj^  =  Pj^  j.  We  shall  use  this  result  to  show  that  the  forward 
error  filter 


N. 


N  -k 


Hj^(z)  =  1  -  S  z 
k=l 


(56) 


is  a  Hurwitz  polynomial,  i.  e. ,  all  its  roots  z^  are  inside  the  unit  circle  [  14l : 

lzj<l  (57) 

I 

From  this  it  will  follow  that  all  roots  of  the  backward  error  filter  Hj^(z)  are 
outside  the  unit  circle  because. 

^(z)  =  Hj^(l/z)  (58) 

Proof  by  induction.  Clearly,  Hj(z)  =  1  -  Fj  z  ,  hence,  |  =  1  Tj  |  <  1. 
Suppose  that  (57)  is  true  for  all  orders  up  to  N-1.  We  shall  show  that  it  is 

This  proof  was  suggested  to  the  author  by  Th.  Andrikos. 
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IV.  Spectral  Estimation 

We  shall  now  relate  the  preceding  results  to  the  method  of  maximum 
entropy. 

Deterministic  case.  We  are  given  the  first  N+1  values  of  the  autocorrelation 
R[m3  of  a  random  process  s[n3  and  we  wish  to  estimate  its  power  spectrum 

00 

S(z)  =  Yi  R[xn3z"™  (65) 

m=-oo 


For  this  purpose,  we  shall  construct  an  AR  process  s^[n3  of  order  N  with 
autocorrelation  R^  Cm  3  such  that  [m  3  =R[m3for  ln7j£N.  The  power 
spectrvim  S^(z)  of  this  process  will  be  used  as  the  estimate  of  S(z). 

The  construction  of  s^Cn3  is  based  on  the  determination  of  the  Nth 
order  prediction 


N 


®Ntn3  =  Yi  s  [n-  k] 
k=l 


(66) 


N 

of  sCnJ.  As  we  have  shown,  the  coefficients  a^^  of  this  predictor  can  be 
determined  by  solving  the  system  (12),  or  equivalently,  from  the  recursion 
equations 


N  N-1  ^  _  N-1 

^k  ■  ^N^-k 

N 

-S  a5^'^RCN-k3 

k=l 


1  <  k<  N  -  1 


N>  1 


(67) 


^  ^  ^  ^  ^  ^N- 1 

with  the  initial  condition  =  R[03.  In  either  case,  the  solution  is  uniquely 
determined  in  terms  of  the  known  values  of  RCm3.  We  next  form  the  AR 
filter  of  Fig.  3  with  system  function 


14- 


38, 


N 

T{z)  =  -;r^ -  where  ^(z)  =  1  "  ^  (68) 

k=l 


is  the  error  filter  of  the  predictor  Sj^  [n]  of  s  [n]  [see  (17)  ]  .  As  input  to 

this  system  we  use  a  stationary  white  noise  process  ^[n]  with  average  power 

Pj^.  Denoting  by  s^[n3  the  resulting  output,  we  conclude  that 

N 

^  [m]  =  [m]  (69) 

k=l 


The  system  T(2)  is  stable  because  Hj^(z)  is  a  Hurwitz  polynomial.  There¬ 
fore,  its  output  s^[n]  is  stationary  and  since  it  satisfies  (69),  it  is  AR.  From 
this  and  (32),  it  follows  that  the  Nth  order  predictor  s^[n3  of  s^[nl  is  given 
by 


s 

o 


[n] 


N 


=  z 


N  r  y  "[ 


k=l 


(70) 


This  shows  that  the  process  s^[n]  of  Fig.  3  and  the  original  process  sCn]  have 
identical  predictors,  therefore  [see  (14)],  their  corresponding  autocorrelations 
R-Q[na3  ^rid  R[m3  are  equal  for  [m]  <N  within  a  factor.  We  maintain  that  this 
factor  is  one.  Indeed,  with  e^[n3  =  s^Tn]  -  s^[n3  the  prediction  error  of 
s^[n3i  it  follows  from  (69)  and  (70)  that 

E{e^  [n3  ]  =  E{;^[n3}  =  R^^[03  = 

hence  (see  note,  page  4) 

RQ[m3  =  R[m3  Im]  (71) 

This  shows  that  if  we  use  as  the  estimate  of  the  unknown  spectrum  S(z) 
of  s  [  n  3  the  spectrum  Sq(z)  of  the  AR  process  s^[n3,  its  inverse  transform 
will  agree  with  the  given  values  of  R[m3.  Since  •S^^(z)  =  if  follows  from 
(34)  that 
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39. 


S  (z)  = 
o'  ' 


N 


and  on  the  vinit  circle, 


S  (w)  =  S  {e-’'^)  =  - — 

o  o  '  I  N 


*-Z 


k=l 


(72) 


(73) 


This  is  the  maximum  entropy  estimate  of  the  unknown  spectrum  S(w).  The 

N 

numerator  and  the  coefficients  a^^  are  determined  from  (67). 

Random  case.  We  are  given  the  samples  (data)  s[l],  s[2],...,s  [N^  ]  of 

a  single  realization  of  a  process  s  [n]  and  we  wish  to  estimate  its  power 

spectrum  S(«).  The  maximum  entropy  estimator  S(v)  of  S(w)  is  an  all-pole 

function  as  in  (73).  However,  unlike  the  deterministic  case,  the  value  of  N 

is  not  specified.  The  problem  now  is  to  select  first  N  and  then  to  estimate 
N 

the  coefficients  a^^ ,  Suppose  that  we  have  somehow  decided  on  the  value  of  N. 

We  then  proceed  as  in  the  deterministic  case  using  the  recursion  equations 

N 

(67).  These  equations  specify  a^^  in  terms  of  the  constants  and  the  initial 
condition  Pq  =  R[0].  It  suffices,  therefore,  to  determine  the  estimates 
and  Pp  of  these  constants  by  appropriate  time-averages  involving  the  given 
data. 

For  the  estimate  of  P^,  we  use  the  sum 

N 

(74) 

°  n=l 

For  the  estimate  of  Fj^.  we  use  the  time-average  version  of  (44)  or  (53).  As 
we  have  shown,  these  equations  are  equivalent;  however,  because  ot  end-effects 
the  corresponding  time-averages  are  not  equivalent.  We  shall  use  the  latter 
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k. 


40. 


because,  unlike  (44),  it  leads  to  an  estimate  that  satisfies  the  stability- 


condition 


Our  problem,  thus,  is  reduced  to  the  determination  of  the  time-average 


form  of  the  equation 


[n-N]} 

I  ^t®N-l  +  ®N-1 


where 


^_j[nl  =  s[n]  -  ^a^s  [n-1  ]+  .  •  .  +  a^~J  s  [n-N+1 1  j 

®N-1  ^  ®  [n-N]  -  (a^s  [n-N+l]  +  .  . .  +  a^‘|  s  [n-1 

The  above  involves  all  samples  of  s[nl  from  n  to  n-N.  And  since  the  data 
are  available  only  from  n  =  1  to  n  =  N^,  to  avoid  overflow  in  the  time -average 
form  of  (7  6),  we  must  limit  the  values  of  n  from  N  +  1  to  N^.  This  interval 
has  N^-N  -  1  points,  hence. 


N  -N-1  S 

o _ n=N+  I _ _ 

2(N  -N-1  S  (®N-1^”^'*’  ®N-l^""^0 
°  n=N+ 1 


The  above  ratio  satisfies  (75)  because  (Schwar  z  inequality) 

|Z®N-1  ^"^®N-1  £  Z  ®N-1  S  ®N-1 

and  _ 

J\^y\<  +  |y|) 

A 

With  Fj^j  so  determined,  the  Nth  order  estimate  of  the  unknown  spectrum 
is  given  by 


1 
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=N<">  = 


1  V 


-N  -jku/r 
a,  e 

K 


where  the  coefficients  are  determined  recursively  as  in  (67): 


-N  -N-1  ,  ^  -N-1  -N  - 
^  ^N^-k 


^N-1 


The  recursion  starts  with  the  estimate  (76)  of  . 

We  conclude  with  a  brief  comment  on  the  choice  of  N,  This  choice  is 
dictated  by  two  conflicting  requirements;  For  a  satisfactory  approximation  of 
the  unknown  spectrum  S(cj)  by  an  all-pole  function  S^(u),  N  should  be  as  large 

A 

as  possible.  However,  in  the  estimate  (77)  of  the  number  of  terms  in  the 

time-average  equals  N^-N  -1,  and  as  we  know,  this  number  should  be  large 
for  the  variance  of  the  estimate  to  be  small.  Various  schemes  have  been  sug¬ 
gested  for  selecting  N  but  they  will  not  be  discussed  [15],  [l6l. 


The  estimate  (79)  of  can  be  obtained  by  minimizing  the  time-average 


form 


o  2 

°  m=N+l 

of  the  MS  error  Pj^  as  given  by  (54).  Setting  =  0,  we  obtain  (77). 

A 

However, the  resulting  value  of  Ij^  does  not  equal  the  estimate  Pj^  of  P^^  ob¬ 
tained  recursively  from  (80). 
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We  shall  finally  show  that  the  all-pole  model  is  a  consequence  of  the 
principle  of  maximum  entropy.  The  required  background  is  discussed  in 


the  Appendix.  We  repeat  the  problem:  We  are  given  the  N  +  1  values  R[o3, 
R[n3  of  the  autocorrelation  R[m3  of  a  random  process  s[n3  and  we  wish  to 
estimate  its  power  spectrum  S(u).  The  statistics  of  sCn3  are  determined  in 
terms  of  the  joint  density  of  the  r.  v.  s  [n3  ,  s  [n-1  3  ,  .  .  .  ,  s  [n-r  3  •  Hence, 
to  apply  the  method  of  maximum  entropy,  we  must  determine  the  unknown 
values  of  R[m3  so  as  to  maximize  the  entropy  H(Sq,  ■  •  •  ,  s^)  of  these  r.  v. 
and  to  find  the  limit  as  r  -»<».  This  is  equivalent  to  the  maximization  of  the 
entropy  rate  of  srn3  [see  {A-12)3.  subject  to  the  given  constraints. 

We  shall  show  that  is  maximum  if  s  [n3  is  a  normal  process  with  power 
spectrum  as  in  (73). 

We  give  three  proofs.  The  first  two  involve  the  maximization  of  H  . 

s 

In  the  third,  we  find  R[N+1  3  by  maximizing  H(Sq  ,  and^with  R[N+1  3 

so  determined,  we  continue  the  process.  This  method  can  be  questioned 

because  it  does  not  yield  the  maximum  of  H(Sq,  •  •  .  ,  s^,  •  •  •  ,  subject 

to  the  given  constraints.  However,  the  result  is  correct  in  the  limit  as  k-»<». 
Method  1,  We  form  the  Nth  order  predictor  Sj^[n3  of  s[n3  and  the  predictor 

error 

N 

8[n3  -  aj^s[n-k3  =  ej^[n3  (81) 

k=l 

Clearly,  ej^[n3  is  the  output  of  the  error  filter  H^(z)  [see  (17)3  with  input 
s[n3.  Hence,  [see  (A-6)3,  its  entropy  rate  H~  is  given  by 

(t) 

^e  =  +  4S”  (82) 

•  o 

-w 

o 

To  maximize  H^,  it  suffices,  therefore,  to  maximize  H*  because  the  integral 
is  specified  in  terms  of  the  given  values  of  R[m3.  As  we  know. 


43 


N 

Et^tnD^  =  RCO]  -  V  a^RCk]  (83) 

k=l 

Therefore,  Hg  is  maximum  if  the  process  [n]  is  normal  white  noise  (see 
Appendix).  And  since  Cj^Cn]  is  the  right  side  of  the  recursion  equation  (81), 
we  conclude  that  the  optimum  s[nl  is  an  AR  process  of  order  N,  hence,  its 
power  spectrum  is  all-pole  as  in  (73). 

We  note  that  the  optimum  sCn]  is  a  normal  process  because 
it  is  the  output  of  the  stable  linear  system  T(z)  =  1/Hj^(2)  whose  input  is  the 
normal  process  e^^Cn]. 

Method  2.  In  the  following  reasoning,  we  assume  that  the  process  sCnl  is 
normal.  As  we  have  just  shown,  this  assumption  is  not  restrictive.  From 
the  normality  of  s[n]  it  follows  that,  within  a  constant,  its  entropy  rate  is 
given  by  [see  (A-20)  ] 

«o 

^  /  lnS'(u)dw  "o  = 

o  -w 

o 


where 


5M  =  I  Rtmle-i”'^ 

ni=-oo 


(85) 


Since  R[m  1  is  s 
of  RCm  ]  for  |m 


pecified  for  (mj  <_N,  the  above  integral  depends  on  the  values 
j  >  N  and  it  is  maiximum  if 


BH 

•ST 


3  ^0  1  r  1ml  >N 

^  V  S(u,) 


(86) 


o 


This  shows  that  the  Fourier  series  coefficients  of  the  function  l/S(w)  are  zero 
for  1ml  >N.  Hence, 


-20- 


uu. 


N 


—  =T 

kt-N 


-jkoir 


(87) 


And  since  S(w)  >  0,  it  follows  from  the  Fejer-Riess  theorem  [iZ]  that  the 
above  sum  can  be  written  as  a  square.  This  yields 


S(«)  = 


N 

k=0 


-jkuJT 


(88) 


^  n 

We  have  thus  shown  that  S{w)  is  an  all-pole  function  as  in  (73)  where  =  VI^oI 

and  a,  =  b,  /b  . 
k  k'  o 

Method  3.  This  method  is  iterative.  In  the  first  iteration,  we  determine 
R  CN+  1  3  so  as  to  maximize  the  entropy  H  of  the  r .  v.  s  [n  ]  ,  s  [n-  1  3  >  •  •  ■  . 
s[n-N-l3.  For  this  purpose,  we  start  with  the  assumption  that  R  [N+l  3  is 
specified  and  we  determine  the  joint  density  of  the  above  r.v,  for  maximum 
H.  As  we  show  in  the  Appendix,  H  is  maximum  if  these  r.v.  are  jointly 
normal  with  zero  mean.  In  this  case  [see  (A-24)3 


where 


H=fn^ 

(Zrre)  A 

(89) 

R[o3 

REl]  .  . 

.  R[N  +  13 

RCl] 

R[o3  .  . 

.  rCn3 

R[N+l3 

R[N3  .  . 

.  RCo3 

The  above  determinant  is  a  non-negative  quadratic  in  R[N  +  i3  and  it  is  maxi¬ 
mum  if 
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45, 


N 

R[N  +  0  =  2  aJ^S-CN  +  l -k]  (90) 

k=l 

where  the  coefficients  satisfy  (12).  With  RCN  +  1]  so  determined,  we  con¬ 
tinue  the  iteration,  and  at  the  rth  step  we  determine  R[N+  r]  so  as  to  maxi¬ 
mize  the  entropy  of  the  r.v. 

s  [n],  ....  s  [n  -  N  -  r]  (91) 

This  yields  the  extrapolation  formula 

N 

RCN+r]  =  2  aJ^RCN+r-k]  ^  (92) 

k=l 

The  coefficients  of  the  predictor  of  s[n3  of  order  N  +  r  satisfy 

again  the  system  (12),  where  now  N  is  replaced  by  N+r.  From  this  and  (92) 
it  follows  that 

N+r  _  Q  N  <  k  <  N  +  r 

k  — 

This  shows  that  the  Nth  order  predictor  is  also  the  predictor  of  any  higher 
order;  hence,  stn3  is  an  AR  process  of  order  N  and  its  power  spectrum  is  an 
all-pole  function  as  in  (88)  [see  also  (35)  and  (36)]. 
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APPENDIX 


ENTROPY  AND  ENTROPY  RATE 


We  present  next  for  easy  reference  the  relevant  concepts  from  the 
theory  of  entropy. 

Consider  a  probability  space  S  and  a  partition  A  of  S,  that  is,  a  countable 
collection  of  mutually  exclusive  events  A^  whose  union  equals  S, 

Definition.  The  entropy  H(A)  of  A  is  the  sum 


H{A)  =  -  ^  p.  fnp.  where  p^  =  P(A.) 


(A-1) 


Thus,  entropy  is  a  number  associated  to  each  partition  of  a  probability 
space.  This  number  has  the  following  significance.  As  we  know,  if  the  experi¬ 
ment  is  performed  n  times  and  the  event  Aj^  occurs  m  times,  then  "almost 


certainly" 


p.  o.  n./n 


(A-2) 


provided  that  n  is  "sufficiently  large.  "  This  heuristic  statement  is  the  basis 
for  the  use  of  probability  in  real  problem.  It  can  be  given  a  precise  interpre¬ 
tation  in  the  context  of  the  law  of  large  numbers  [  1 1  ]  . 

We  shall  call  each  sequence  of  the  forms  t  =  (A^^  occurs  m  enp^  times 
in  a  specific  order  )  typical.  The  union  of  all  such  sequences  will  be  denoted 


by  T.  Clearly, 


P(T)  I 


(A-3) 


because  according  to  (A-2),  the  typical  sequences  occur  "almost  certainly.  " 
Each  typical  sequence  is  an  event  in  the  product  sp>ace  s”=SX'  -‘XS”  and 


"l  “2 


P(t)  =  Pi  P2  .  .  . 


(A-4) 
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inp. 

Since  as  np^  and  p^  =  e  ,  it  follows  from  (A-1)  that 


P(t)  s:  e 


np^fnpj  nPN^«PN_  .nH(A) 


(A-5) 


Hence,  the  total  number  N,j.  of  typical  sequences  is  given  by  [5] 


(A- 6) 


It  follows  readily  from  (A-1)  that  H(A)<fnN  with  equality  iff  p^=  l/N,  And 
since  the  total  number  of  sequence  in  S”  equals  n”,  we  conclude  that  if  all 


p.’s  are  not  equal,  then 


H(A)  <  f  nN  «  N 


for  large  n.  Thus,  although  P{T)—  1,  the  number  of  sequences  in  T  is 
small  compared  with  the  total  number  n”  of  all  possible  sequences.  It  is 
this  result  that  forms  the  basis  for  the  applications  of  entropy.  We  shall  use 
it  to  establish  the  conceptual  equivalence  between  maximum  entropy  and  the 
classical  definition  of  probability. 

Suppose  first  that  we  wish  to  determine  p.  in  the  absence  of  any  prior 
information  (no  constraints).  In  this  case,  all  sequences  in  s”  are  equaly 
likely,  hence,  N,j,  must  be  nearly  equal  to  n”  because  P(T)  =•  1,  From  this  and 
(A-6),  it  follows  that  H(A)  must  equal  its  maximum  inN. 

Suppose  next  that  prior  information  is  available  in  the  form  of  inequality 
constraints,  or  expected  values.  Such  information  leads  to  the  condition  that 
only  certain  sequences  in  the  space  s"  are  admissible,  forming  the  subset 
s”.  All  typical  sequences  are  now  in  s” ,  and  since  P(T)«»‘  1  and  the  sequences 
in  s”  are  equally  likely,  must  contain  most  of  them,  i.  e.  ,  H(A)  must  be 
maximum  subject  to  the  given  constraints. 

The  above  argument  is  imprecise  in  the  same  sense  as  (A-2),  however, 
as  in  that  case,  it  can  be  given  a  precise  interpretation  as  a  limit  theorem. 
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A  consequence  of  the  conceptual  equivalence  between  maximum  entropy 

and  classical  definition  is  the  conclusion  that  the  former  is  subject  to  the 

same  critique  as  the  latter.  We  should  note  in  support  of  maximum  entropy 

that  in  most  problems  involving  prior  constraints,  the  classical  definition  must 

be  applied  not  to  the  original  space  S,  but  to  the  vastly  more  complex  space 

S”  whereas  the  maximum  entropy  deals  only  with  quantities  in  S.  This 

simplification  is  the  primary  reason  for  using  maximum  entropy.  However, 

it  is  in  such  cases  that  the  results  are  least  reliable.  We  shall  illustrate  with 

the  die  experiment.  In  the  absence  of  prior  information,  we  reach  the  reason- 

‘  expected  value  of  the 

able  conclusion  that  p^=  l/6.  If  we  know,  on  the  other  hand,  that  the'Tzero-one 
r.v.  associated  with  the  event  “one"  equals  0.  1998,  say,  then  the  conclusion 
is  that  p^  =  0.  1998,  p2  =  p^  =•••=  P^=  0- 16004.  Unlike  the  fair-die  case,  our 
trust  in  the  correctness  of  these  values  is  not  great,  although  we  have  no  other 
reasonable  alternative. 

These  observations  are  relevant  we  believe  in  the  application  of  the 
method  to  spectral  estimation  problem.  In  our  view,  the  method  is  popular 
not  because  it  leads  to  an  all-pole  model  as-a  logical  imperative,  but  rather 
because  the  model  is  numerically  simple,  and  unlike  earlier  methods,  it  can 
detect  sharp  peaks  in  the  unknown  spectra. 

Random  variables.  Consider  a  discrete-type  r.  v.  x  taking  the  values  Xj^  with 
probability  p.  •  Clearly,  the  events  {x  =  x.  }  form  a  partition  A  ,  The  entropy 
of  the  r.v.  x  is  by  definition  the  entropy  of  this  partition: 

H(x)  =  H(A^)  =  -  2  p. /n  p.  (A -7) 

i 

The  entropy  of  a  continuous -type  r.v.  x  cannot  be  so  defined  because  the  events 
{x  =Xj^l  do  not  form  a  partition  (they  are  not  countable).  In  this  case  a  limiting 
argiunent  is  used:  The  r.v.  x  is  approximated  by  a  discrete -type  v.v.  x^  taking 
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the  values  Xj^  =  i6  with  probability  Pj^  =  f(x^)6  where  f(x)  is  the  density  of  x.  As 
we  see  from  (A-7) 

Htxg)  =  -  ^6f(x.)inf(x.)  -inbT  6  f(x. ) 
i  i 


Hence,  H{Xg)  -*oo  as  b'*0.  This  is  so  because  of  the  underlying  assumption 
that  the  various  values  of  x  can  be  recognized  as  distinct  no  matter  how  close 
they  are.  However, 

00 

H(x^)+in6  -  f  f{x)inf{x)dx  as  6 -»  0 

0  •' 

-00 


And  it  is  this  limit  that  is  used  as  the  entropy  of  x: 

00 

H(x)  =  -  J  f(x)  Inf(x)  dx  =  -E(inf{x)}  (A-8) 

-00 


The  addition  of  the  term  fn6  is  a  recognition  of  the  fact  that,  in  real  problems^ 
only  values  of  x  whose  difference  exceeds  a  certain  level  can  be  considered  as 
distinct. 

The  joint  entropy  of  the  vector  r.  v.  x  =  (Xj^,  ...»  x^)  with  density 
f(Xj,  .  • .  ,  x^)  is  defined  similarly: 

H(x^,  .  . .  ,  xp  =  -E{inf(Xj,  .  .  .  ,  xp  }  (A-9) 


As  we  know  [ill,  if  y:(yj»  •  •  •  *  yj,)  is  a  linear  transformation  of  x,  that  is, 
if  y  =  Ax  where  A  is  an  r  by  r  non-singular  matrix,  then 


f(yi »  •  •  • .  y^)  =  7“  f(x, .  •  •  • ,  X  ) 
A  ^ 


hence. 


H(yj,  .  .  •  .y^)  =  -ECfnf(Xj,  . . . ,  Xy)}  +in|Al  (A-10) 


Entropy  rate.  We  shall  finally  define  the  entropy  rate  of  a  discrete  stationary 
process  xCn],  From  the  stationarity  of  x[n]  it  follows  that  the  joint  entropy 
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of  the  r.  V.  x  [n3  i  •  •  •  ,  x^n  -  r+  1  3is  independent  of  n.  The  entropy  rate  H 


of  the  process  x[n]  is  by  definition  the  limit 


H  =  —  H(x, ,  •  •  • ,  X  )  as  r  -♦  00 
X  r  '  1  r' 


(A- 12) 


^Suppose  that  x[n3  is  the  input  to  a  stable  causal  system  with  delta  response 
h[n3  and  system  function  H(z).  If  x[n3is  applies  at  n=  -oo,  then  the  resulting 
response  y[n3  is  stationary  with  entropy  rate 
Theorem  1.  If  H(z)  is  minimum  phase,  then 


-w 


Proof.  We  can  assume,  introducing  if  necessary  a  change  in  sign  and  an 
appropriate  shift  of  the  time -origin, that  h[n3  >  0.  If  x[n3  is  applied  at  n=  0, 
then  the  resulting  response 


n 


y  Cn3  =  xCn-k3  hCk3 
k=0 


(A- 14) 


is  not  stationary.  However,  it  tends  to  the  stationary  process  y  [n3  as  n  -*oo. 
Clearly,  (A-14)  is  a  linear  transformation  of  the  r.v.  x^  =  x[03>  •  •  •  , 
into  the  r.  v.  /q  =  y  [  03  •  *  •  •  »  =  y  ["3  of  the  form  y  =  Ax  where 


A  = 


Hence  [see  (A-8)3 


Dividing  by  n  +  1  and  making  n  “»<«,  we  obtain 

Hy  =  H^  +  inh[03 


’h[03 

0 

...  0 

h[l3 

•  •  •  •  • 

h[o3 

...  0 

|Ai .  h"+‘[o: 

h[n3 

h[n-l3 

h[o3_ 

H{Xo>  •  •  • 

,  x^)  +  (n+l)inh[03 
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Therefore,  to  complete  the  proof  of  the  theorem  it  suffices  to  show  that  the 
term  in  h  [  0  3  in  (A-1 5)  equals  the  integral  in  (A- 13).  Since  |H(e^'^^)l^  = 
H(e'^^)  H(e  we  conclude  with  z  =  e'^'^  that 

cOo 

jx/  inlH(ej'^)|^daj  =  /-  in[H{z)H(l/z)]dz 
-“o 

where  the  line  integral  is  along  the  unit  circle.  But 
/  i  f  n  H{z)  dz  =  i  in  H(  1/z)  dz 

hence  it  suffices  to  show  that 

inh[0]  =  -inH(z)dz  (A-16) 

From  the  assumption  that  H(z)  is  minimum -phase,  it  follows  that  the  integrand 
in  (A-16)  is  analytic  for  \z  \  >  1,  hence,  the  circle  of  integration  can  be  made 
arbitrarily  large.  And  since  h[0]  =inH(z)  as  z  -♦<»,  we  conclude  that  the 
integral  equals 

inh[03^  =  2rrjinh[03 

and  (A-16)  results. 

Normal  processes.  If  x  is  a  normal  r.v.  with 

f(x)  =  - !— 

then  H(x)  =  -E{inf(x)3  =  in  C«/2TTe. 

2  2 

If  vCn3  is  normal  white  noise  with  E  (v  [n3  }  =  P  ,  then 
f(Vj,  Vy)  =  f(Vj)  •••  f(v^) 

hence 

H(V,,...,  V J  =  -E(in[f(v,)  f(v Jl)  =rin  (A.  17) 

1  r  i  V 
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From  this  and  (A-13),  it  follows  that  if  v[n3  is  white  noise,  then 

=  In  0 


(A-18) 


Theorem  2.  If  x[n3  is  a  normal  process  with  power  spectrum  S{w)  such  that 

u 

o  _ 

/  inS(u)dw<oo  (A- 19) 

-  w 

o 

then  its  entropy  rate  is  given  by 

u> 

o 

=  in  J  inS(w)dw  (A-20) 


Proof.  Since  xCn]  is  normal,  all  its  statistical  properties,  including  its 
entropy  rate  [see  (A-IZ)],  can  be  expressed  in  terms  of  its  autocorrelation 
R[m3.  From  this  it  follows  that  if  another  process  y[n3  has  the  same  auto¬ 
correlation  R[m  j,  then  its  entropy  rate  H  will  equal  H  .  Since  S(u)  is  an 

y  ^ 

even,  positive  fvinction,  and  it  satisfies  the  discrete  form  (A- 19)  of  the  Paley- 
Wiener  condition  [22],  it  can  be  factored  into  a  product  [l2] 


S(w)  =  H(e^‘^)H(e"^‘^) 


(A-21) 


where  H(z)  is  the  system  function  of  a  real  causal  minimuan  phase  system. 


Using  as  input  to  this  system  a  white -noise  normal  process  with  zero  mean 
and  variance  one,  we  obtain  as  output  a  normal  process  s  [n  j  with  entropy 
rate  [see  (A-13)  and  (A-18)] 

(J 

o 


and  (A-20)  follows  from  (A-21). 

Maximum  entropy  with  constraints.  The  solution  of  problems  involving  maxi¬ 
mum  entropy  with  constraints  in  the  form  of  expected  values  is  a  simple  con¬ 
sequence  of  the  following  inequality:  If  f(x)  and  g(x)  are  two  arbitrary  density 
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functions,  then 


'f  f(x)  fng(x)  dx  >  -  J*  f(x)  jfn f(x)  dx 


(A-22) 


with  equality  iff  f(x)  =  g(x).  Indeed,  as  it  is  easy  to  see,  iny<  1  -  y  and  (A-22) 
follows  readily  with  y  =  g(x)/f(x).  The  above  holds  also  if  f(x)  and  g(x)  are  re¬ 
placed  by  joint  densities  of  any  order. 

Using  (A-22),  we  shall  determine  the  density  f  (x)  of  a  r.v.  x  so  as  to 
maximize  its  entropy  H(x)  subject  to  the  constraint 

CO 

E{x^3  =  J  x^f(x)  dx  = 


With 


g(x)  = 


_1 _ g-x^/2a^ 


0V2tt 


it  follows  from  (A-22)  that 


-  J  f(x)in£  (x)dx  5.  “  /  ^(x)(-  -in  aVZrr^dx  =-i  +  in  oJZn 

2a 


The  left-side  equals  H(x)  and  the  right-side  is  specified,  hence,  H(x)  is  maxi¬ 
mum  iff  f(x)  =  g(x),  that  is,  if  x  is  normal  with  zero  mean. 

We  now  wish  to  find  the  joint  density  f(xj,  .  .  . ,  x^)  of  the  r,  v.  x^,  .  .  . , 
so  as  to  maximize  their  entropy  H(x^,  .  . .  ,  x^)  subject  to  the  constraints 


E{x.xp=  i,j  =  1,  ....  r 


{A-23) 


With 


. x^)  x^  =  . .  ^  = 


Mil*  •••  .  Mij. 


Mri*  •••  .  Mi 
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g(x)  = 


1  ,  -1 
-  2 


we  conclude  applying  the  multidimensional  form  of  (A-22)  that  f{x)  =  g(x)  and 

H(x)  =  ln^j{Z7:ef\fi\  {A -24) 


We  similarly  conclude  that  if  is  specified  for  i  =  j  only,  then  H(x)  is  maximum 

if  the  r.v.  are  normal  independent  with  zero-mean  and  variance 

From  the  above  and  (A-5)  it  follows  that  if  x[n]  is  a  random  process  with 

2  2 

given  average  power  E{x  [n]}  =  O  ,  then  its  entropy  rate  is  maximum  if 
x[n]  is  normal  white  noise. 
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Fig.  1. 


Fig.  2. 


Figure  Captions 


Forward  predictor  filter  Hj^(z). 

Sj^[n3:  predictor  of  s[n],  Cj^Cn]:  predictor  error. 

Lattice  filter. 

error,  e^^  [u3:  backward  error. 
Cascade  of  AR  filter  T(z)  =  l/i^(z)  and  predictor  filter 


Fig.  3. 


